Overview
Brought to you by YData
Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 1048575 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 25 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 308.0 MiB |
| Average record size in memory | 308.0 B |
Variable types
| Categorical | 6 |
|---|---|
| DateTime | 2 |
| Numeric | 10 |
| Boolean | 1 |
| Dataset has 25 (< 0.1%) duplicate rows | Duplicates |
VendorID is highly overall correlated with extra | High correlation |
congestion_surcharge is highly overall correlated with improvement_surcharge and 2 other fields | High correlation |
extra is highly overall correlated with VendorID | High correlation |
fare_amount is highly overall correlated with total_amount and 1 other fields | High correlation |
improvement_surcharge is highly overall correlated with congestion_surcharge and 1 other fields | High correlation |
mta_tax is highly overall correlated with congestion_surcharge and 1 other fields | High correlation |
tip_amount is highly overall correlated with total_amount | High correlation |
total_amount is highly overall correlated with congestion_surcharge and 3 other fields | High correlation |
trip_distance is highly overall correlated with fare_amount and 1 other fields | High correlation |
store_and_fwd_flag is highly imbalanced (96.3%) | Imbalance |
payment_type is highly imbalanced (57.8%) | Imbalance |
mta_tax is highly imbalanced (92.0%) | Imbalance |
improvement_surcharge is highly imbalanced (95.5%) | Imbalance |
congestion_surcharge is highly imbalanced (74.6%) | Imbalance |
Airport_fee is highly imbalanced (70.9%) | Imbalance |
trip_distance is highly skewed (γ1 = 772.1069164) | Skewed |
passenger_count has 11322 (1.1%) zeros | Zeros |
trip_distance has 15052 (1.4%) zeros | Zeros |
extra has 415665 (39.6%) zeros | Zeros |
tip_amount has 250931 (23.9%) zeros | Zeros |
tolls_amount has 966772 (92.2%) zeros | Zeros |
Reproduction
| Analysis started | 2024-12-09 21:48:22.304585 |
|---|---|
| Analysis finished | 2024-12-09 21:52:10.965520 |
| Duration | 3 minutes and 48.66 seconds |
| Software version | ydata-profiling vv4.12.1 |
| Download configuration | config.json |
Variables
VendorID
Categorical
High correlation 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 50.0 MiB |
| 2 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 795759 | |
| 1 | 252816 | 24.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 795759 | |
| 1 | 252816 | 24.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 795759 | |
| 1 | 252816 | 24.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1048575 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 795759 | |
| 1 | 252816 | 24.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1048575 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 795759 | |
| 1 | 252816 | 24.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1048575 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 795759 | |
| 1 | 252816 | 24.1% |
| Distinct | 17488 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.0 MiB |
| Minimum | 2002-12-31 22:59:00 |
|---|---|
| Maximum | 2024-12-01 23:59:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 17543 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.0 MiB |
| Minimum | 2002-12-31 23:05:00 |
|---|---|
| Maximum | 2024-12-01 23:59:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
Zeros 
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3653024 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 11322 |
| Zeros (%) | 1.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.87298052 |
|---|---|
| Coefficient of variation (CV) | 0.6394045 |
| Kurtosis | 9.3016177 |
| Mean | 1.3653024 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.8581338 |
| Sum | 1431622 |
| Variance | 0.762095 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 798851 | |
| 2 | 155991 | 14.9% |
| 3 | 37704 | 3.6% |
| 4 | 24078 | 2.3% |
| 5 | 12466 | 1.2% |
| 0 | 11322 | 1.1% |
| 6 | 8132 | 0.8% |
| 8 | 26 | < 0.1% |
| 7 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 11322 | 1.1% |
| 1 | 798851 | |
| 2 | 155991 | 14.9% |
| 3 | 37704 | 3.6% |
| 4 | 24078 | 2.3% |
| 5 | 12466 | 1.2% |
| 6 | 8132 | 0.8% |
| 7 | 5 | < 0.1% |
| 8 | 26 | < 0.1% |
| Value | Count | Frequency (%) |
| 8 | 26 | < 0.1% |
| 7 | 5 | < 0.1% |
| 6 | 8132 | 0.8% |
| 5 | 12466 | 1.2% |
| 4 | 24078 | 2.3% |
| 3 | 37704 | 3.6% |
| 2 | 155991 | 14.9% |
| 1 | 798851 | |
| 0 | 11322 | 1.1% |
trip_distance
Real number (ℝ)
High correlation  Skewed  Zeros 
| Distinct | 3774 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.4430206 |
| Minimum | 0 |
|---|---|
| Maximum | 10879.28 |
| Zeros | 15052 |
| Zeros (%) | 1.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.46 |
| Q1 | 1 |
| median | 1.7 |
| Q3 | 3.28 |
| 95-th percentile | 16.1 |
| Maximum | 10879.28 |
| Range | 10879.28 |
| Interquartile range (IQR) | 2.28 |
Descriptive statistics
| Standard deviation | 11.675558 |
|---|---|
| Coefficient of variation (CV) | 3.3910799 |
| Kurtosis | 718118.28 |
| Mean | 3.4430206 |
| Median Absolute Deviation (MAD) | 0.88 |
| Skewness | 772.10692 |
| Sum | 3610265.3 |
| Variance | 136.31865 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 15052 | 1.4% |
| 0.9 | 14001 | 1.3% |
| 0.8 | 13787 | 1.3% |
| 1 | 13727 | 1.3% |
| 1.1 | 13367 | 1.3% |
| 0.7 | 13183 | 1.3% |
| 1.2 | 12817 | 1.2% |
| 1.3 | 12273 | 1.2% |
| 1.4 | 11575 | 1.1% |
| 0.6 | 11543 | 1.1% |
| Other values (3764) | 917250 |
| Value | Count | Frequency (%) |
| 0 | 15052 | |
| 0.01 | 892 | 0.1% |
| 0.02 | 630 | 0.1% |
| 0.03 | 470 | < 0.1% |
| 0.04 | 373 | < 0.1% |
| 0.05 | 314 | < 0.1% |
| 0.06 | 254 | < 0.1% |
| 0.07 | 218 | < 0.1% |
| 0.08 | 211 | < 0.1% |
| 0.09 | 189 | < 0.1% |
| Value | Count | Frequency (%) |
| 10879.28 | 1 | |
| 971.8 | 1 | |
| 964.6 | 1 | |
| 233.25 | 1 | |
| 210.82 | 1 | |
| 176.43 | 1 | |
| 142.62 | 1 | |
| 115.75 | 1 | |
| 111.57 | 1 | |
| 101.28 | 1 |
RatecodeID
Real number (ℝ)
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.1558734 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 10.196364 |
|---|---|
| Coefficient of variation (CV) | 4.7295743 |
| Kurtosis | 86.064939 |
| Mean | 2.1558734 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.3760771 |
| Sum | 2260595 |
| Variance | 103.96583 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 981137 | |
| 2 | 42416 | 4.0% |
| 99 | 11476 | 1.1% |
| 5 | 7689 | 0.7% |
| 3 | 3373 | 0.3% |
| 4 | 2483 | 0.2% |
| 6 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 981137 | |
| 2 | 42416 | 4.0% |
| 3 | 3373 | 0.3% |
| 4 | 2483 | 0.2% |
| 5 | 7689 | 0.7% |
| 6 | 1 | < 0.1% |
| 99 | 11476 | 1.1% |
| Value | Count | Frequency (%) |
| 99 | 11476 | 1.1% |
| 6 | 1 | < 0.1% |
| 5 | 7689 | 0.7% |
| 4 | 2483 | 0.2% |
| 3 | 3373 | 0.3% |
| 2 | 42416 | 4.0% |
| 1 | 981137 |
store_and_fwd_flag
Boolean
Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 MiB |
| False | |
|---|---|
| True | 4159 |
| Value | Count | Frequency (%) |
| False | 1044416 | |
| True | 4159 | 0.4% |
PULocationID
Real number (ℝ)
| Distinct | 247 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 165.18883 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 132 |
| median | 161 |
| Q3 | 233 |
| 95-th percentile | 249 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 101 |
Descriptive statistics
| Standard deviation | 63.082007 |
|---|---|
| Coefficient of variation (CV) | 0.38187817 |
| Kurtosis | -0.81467577 |
| Mean | 165.18883 |
| Median Absolute Deviation (MAD) | 54 |
| Skewness | -0.24910261 |
| Sum | 1.7321287 × 108 |
| Variance | 3979.3396 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 132 | 62088 | 5.9% |
| 161 | 50493 | 4.8% |
| 237 | 48796 | 4.7% |
| 236 | 47161 | 4.5% |
| 142 | 39224 | 3.7% |
| 186 | 38951 | 3.7% |
| 230 | 38121 | 3.6% |
| 162 | 37813 | 3.6% |
| 138 | 34626 | 3.3% |
| 239 | 32117 | 3.1% |
| Other values (237) | 619185 |
| Value | Count | Frequency (%) |
| 1 | 144 | < 0.1% |
| 2 | 2 | < 0.1% |
| 3 | 33 | < 0.1% |
| 4 | 837 | |
| 6 | 9 | < 0.1% |
| 7 | 515 | |
| 8 | 5 | < 0.1% |
| 9 | 19 | < 0.1% |
| 10 | 373 | |
| 11 | 18 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 609 | 0.1% |
| 264 | 3718 | 0.4% |
| 263 | 20364 | |
| 262 | 14029 | |
| 261 | 4958 | 0.5% |
| 260 | 263 | < 0.1% |
| 259 | 36 | < 0.1% |
| 258 | 48 | < 0.1% |
| 257 | 33 | < 0.1% |
| 256 | 184 | < 0.1% |
DOLocationID
Real number (ℝ)
| Distinct | 260 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 164.62893 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 43 |
| Q1 | 114 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 261 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 120 |
Descriptive statistics
| Standard deviation | 69.495842 |
|---|---|
| Coefficient of variation (CV) | 0.42213626 |
| Kurtosis | -0.90044768 |
| Mean | 164.62893 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | -0.37621195 |
| Sum | 1.7262578 × 108 |
| Variance | 4829.6721 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 236 | 49431 | 4.7% |
| 237 | 43945 | 4.2% |
| 161 | 40372 | 3.9% |
| 239 | 33436 | 3.2% |
| 230 | 33201 | 3.2% |
| 142 | 33192 | 3.2% |
| 170 | 30470 | 2.9% |
| 162 | 29932 | 2.9% |
| 141 | 29417 | 2.8% |
| 238 | 27882 | 2.7% |
| Other values (250) | 697297 |
| Value | Count | Frequency (%) |
| 1 | 2966 | |
| 2 | 1 | < 0.1% |
| 3 | 95 | < 0.1% |
| 4 | 3751 | |
| 5 | 6 | < 0.1% |
| 6 | 25 | < 0.1% |
| 7 | 2931 | |
| 8 | 17 | < 0.1% |
| 9 | 114 | < 0.1% |
| 10 | 1106 | 0.1% |
| Value | Count | Frequency (%) |
| 265 | 4667 | 0.4% |
| 264 | 5793 | 0.6% |
| 263 | 22464 | |
| 262 | 16885 | |
| 261 | 4659 | 0.4% |
| 260 | 864 | 0.1% |
| 259 | 145 | < 0.1% |
| 258 | 278 | < 0.1% |
| 257 | 433 | < 0.1% |
| 256 | 1950 | 0.2% |
payment_type
Categorical
Imbalance 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 50.0 MiB |
| 1 | |
|---|---|
| 2 | |
| 4 | 18252 |
| 3 | 7617 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 842317 | |
| 2 | 180389 | 17.2% |
| 4 | 18252 | 1.7% |
| 3 | 7617 | 0.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 842317 | |
| 2 | 180389 | 17.2% |
| 4 | 18252 | 1.7% |
| 3 | 7617 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 842317 | |
| 2 | 180389 | 17.2% |
| 4 | 18252 | 1.7% |
| 3 | 7617 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1048575 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 842317 | |
| 2 | 180389 | 17.2% |
| 4 | 18252 | 1.7% |
| 3 | 7617 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1048575 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 842317 | |
| 2 | 180389 | 17.2% |
| 4 | 18252 | 1.7% |
| 3 | 7617 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1048575 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 842317 | |
| 2 | 180389 | 17.2% |
| 4 | 18252 | 1.7% |
| 3 | 7617 | 0.7% |
fare_amount
Real number (ℝ)
High correlation 
| Distinct | 1936 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.590696 |
| Minimum | -700 |
|---|---|
| Maximum | 1616.5 |
| Zeros | 307 |
| Zeros (%) | < 0.1% |
| Negative | 13716 |
| Negative (%) | 1.3% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | -700 |
|---|---|
| 5-th percentile | 5.1 |
| Q1 | 8.6 |
| median | 12.8 |
| Q3 | 20.5 |
| 95-th percentile | 70 |
| Maximum | 1616.5 |
| Range | 2316.5 |
| Interquartile range (IQR) | 11.9 |
Descriptive statistics
| Standard deviation | 19.313701 |
|---|---|
| Coefficient of variation (CV) | 1.0388907 |
| Kurtosis | 88.010266 |
| Mean | 18.590696 |
| Median Absolute Deviation (MAD) | 4.9 |
| Skewness | 3.5970617 |
| Sum | 19493739 |
| Variance | 373.01904 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8.6 | 51074 | 4.9% |
| 7.9 | 50880 | 4.9% |
| 9.3 | 50373 | 4.8% |
| 10 | 49326 | 4.7% |
| 7.2 | 48839 | 4.7% |
| 10.7 | 46291 | 4.4% |
| 11.4 | 43798 | 4.2% |
| 6.5 | 43452 | 4.1% |
| 70 | 41686 | 4.0% |
| 12.1 | 40744 | 3.9% |
| Other values (1926) | 582112 |
| Value | Count | Frequency (%) |
| -700 | 1 | |
| -600 | 1 | |
| -509.8 | 1 | |
| -439.1 | 1 | |
| -423 | 1 | |
| -404.1 | 1 | |
| -367.7 | 1 | |
| -367 | 1 | |
| -351.6 | 1 | |
| -300 | 1 |
| Value | Count | Frequency (%) |
| 1616.5 | 1 | |
| 912.3 | 1 | |
| 820 | 1 | |
| 700 | 1 | |
| 678.5 | 1 | |
| 620.4 | 1 | |
| 600 | 1 | |
| 536.4 | 1 | |
| 530.8 | 1 | |
| 520.08 | 1 |
extra
Real number (ℝ)
High correlation  Zeros 
| Distinct | 34 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.578604 |
| Minimum | -7.5 |
|---|---|
| Maximum | 11.75 |
| Zeros | 415665 |
| Zeros (%) | 39.6% |
| Negative | 7102 |
| Negative (%) | 0.7% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | -7.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2.5 |
| 95-th percentile | 5 |
| Maximum | 11.75 |
| Range | 19.25 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 1.8518877 |
|---|---|
| Coefficient of variation (CV) | 1.1731173 |
| Kurtosis | 2.4729546 |
| Mean | 1.578604 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.2906061 |
| Sum | 1655284.7 |
| Variance | 3.4294881 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 415665 | |
| 2.5 | 271188 | |
| 1 | 189948 | |
| 5 | 78415 | 7.5% |
| 3.5 | 51361 | 4.9% |
| 7.5 | 9107 | 0.9% |
| 6 | 8664 | 0.8% |
| 9.25 | 4254 | 0.4% |
| -1 | 4027 | 0.4% |
| 4.25 | 4011 | 0.4% |
| Other values (24) | 11935 | 1.1% |
| Value | Count | Frequency (%) |
| -7.5 | 87 | < 0.1% |
| -6 | 129 | < 0.1% |
| -5 | 487 | < 0.1% |
| -2.5 | 2371 | 0.2% |
| -1.5 | 1 | < 0.1% |
| -1 | 4027 | 0.4% |
| 0 | 415665 | |
| 0.01 | 2 | < 0.1% |
| 0.02 | 1 | < 0.1% |
| 0.06 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 11.75 | 1042 | 0.1% |
| 10.25 | 1037 | 0.1% |
| 10 | 249 | < 0.1% |
| 9.95 | 1 | < 0.1% |
| 9.25 | 4254 | |
| 8.5 | 149 | < 0.1% |
| 7.75 | 888 | 0.1% |
| 7.5 | 9107 | |
| 6.75 | 1590 | 0.2% |
| 6 | 8664 |
mta_tax
Categorical
High correlation  Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.0 MiB |
| 0.5 | |
|---|---|
| -0.5 | 13348 |
| 0.0 | 11659 |
| 1.6 | 1 |
| 0.8 | 1 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0127297 |
| Min length | 3 |
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 1023566 | |
| -0.5 | 13348 | 1.3% |
| 0.0 | 11659 | 1.1% |
| 1.6 | 1 | < 0.1% |
| 0.8 | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 1036914 | |
| 0.0 | 11659 | 1.1% |
| 1.6 | 1 | < 0.1% |
| 0.8 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1060233 | |
| . | 1048575 | |
| 5 | 1036914 | |
| - | 13348 | 0.4% |
| 1 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3159073 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1060233 | |
| . | 1048575 | |
| 5 | 1036914 | |
| - | 13348 | 0.4% |
| 1 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3159073 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1060233 | |
| . | 1048575 | |
| 5 | 1036914 | |
| - | 13348 | 0.4% |
| 1 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3159073 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1060233 | |
| . | 1048575 | |
| 5 | 1036914 | |
| - | 13348 | 0.4% |
| 1 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
tip_amount
Real number (ℝ)
High correlation  Zeros 
| Distinct | 3374 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.4163597 |
| Minimum | -80 |
|---|---|
| Maximum | 422.7 |
| Zeros | 250931 |
| Zeros (%) | 23.9% |
| Negative | 48 |
| Negative (%) | < 0.1% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | -80 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2.72 |
| Q3 | 4.2 |
| 95-th percentile | 11.77 |
| Maximum | 422.7 |
| Range | 502.7 |
| Interquartile range (IQR) | 3.2 |
Descriptive statistics
| Standard deviation | 4.0528675 |
|---|---|
| Coefficient of variation (CV) | 1.1863117 |
| Kurtosis | 218.30725 |
| Mean | 3.4163597 |
| Median Absolute Deviation (MAD) | 1.72 |
| Skewness | 5.5523353 |
| Sum | 3582309.3 |
| Variance | 16.425735 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 250931 | |
| 2 | 51527 | 4.9% |
| 1 | 39707 | 3.8% |
| 3 | 26627 | 2.5% |
| 5 | 15154 | 1.4% |
| 2.8 | 13342 | 1.3% |
| 4 | 11804 | 1.1% |
| 3.5 | 11769 | 1.1% |
| 2.1 | 11225 | 1.1% |
| 1.5 | 11189 | 1.1% |
| Other values (3364) | 605300 |
| Value | Count | Frequency (%) |
| -80 | 1 | |
| -65.1 | 1 | |
| -22.24 | 1 | |
| -22 | 1 | |
| -17.59 | 1 | |
| -16.19 | 2 | |
| -8.18 | 1 | |
| -6.65 | 1 | |
| -3.36 | 1 | |
| -3 | 2 |
| Value | Count | Frequency (%) |
| 422.7 | 1 | |
| 303 | 1 | |
| 300 | 1 | |
| 280 | 1 | |
| 144 | 1 | |
| 140 | 1 | |
| 130 | 1 | |
| 110 | 1 | |
| 104 | 1 | |
| 103.65 | 1 |
tolls_amount
Real number (ℝ)
Zeros 
| Distinct | 700 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.57740556 |
| Minimum | -60 |
|---|---|
| Maximum | 101.69 |
| Zeros | 966772 |
| Zeros (%) | 92.2% |
| Negative | 866 |
| Negative (%) | 0.1% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | -60 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 6.94 |
| Maximum | 101.69 |
| Range | 161.69 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.2212313 |
|---|---|
| Coefficient of variation (CV) | 3.8469172 |
| Kurtosis | 56.889985 |
| Mean | 0.57740556 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.004907 |
| Sum | 605453.03 |
| Variance | 4.9338687 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 966772 | |
| 6.94 | 74237 | 7.1% |
| 12.75 | 784 | 0.1% |
| -6.94 | 707 | 0.1% |
| 3.18 | 554 | 0.1% |
| 14.75 | 510 | < 0.1% |
| 13.88 | 426 | < 0.1% |
| 13.38 | 412 | < 0.1% |
| 15.38 | 236 | < 0.1% |
| 5.2 | 138 | < 0.1% |
| Other values (690) | 3799 | 0.4% |
| Value | Count | Frequency (%) |
| -60 | 1 | |
| -55.34 | 1 | |
| -54.02 | 1 | |
| -52.57 | 1 | |
| -45 | 1 | |
| -42.75 | 1 | |
| -40 | 1 | |
| -39.38 | 1 | |
| -38.02 | 1 | |
| -32.75 | 1 |
| Value | Count | Frequency (%) |
| 101.69 | 1 | < 0.1% |
| 87 | 1 | < 0.1% |
| 83 | 1 | < 0.1% |
| 81 | 1 | < 0.1% |
| 80 | 4 | |
| 62.75 | 1 | < 0.1% |
| 60 | 1 | < 0.1% |
| 58.63 | 1 | < 0.1% |
| 57.32 | 1 | < 0.1% |
| 55.55 | 1 | < 0.1% |
improvement_surcharge
Categorical
High correlation  Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.0 MiB |
| 1.0 | |
|---|---|
| -1.0 | 13756 |
| 0.0 | 263 |
| 0.3 | 98 |
| -0.3 | 1 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0131197 |
| Min length | 3 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 1034457 | |
| -1.0 | 13756 | 1.3% |
| 0.0 | 263 | < 0.1% |
| 0.3 | 98 | < 0.1% |
| -0.3 | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 1048213 | |
| 0.0 | 263 | < 0.1% |
| 0.3 | 99 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1048838 | |
| . | 1048575 | |
| 1 | 1048213 | |
| - | 13757 | 0.4% |
| 3 | 99 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3159482 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1048838 | |
| . | 1048575 | |
| 1 | 1048213 | |
| - | 13757 | 0.4% |
| 3 | 99 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3159482 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1048838 | |
| . | 1048575 | |
| 1 | 1048213 | |
| - | 13757 | 0.4% |
| 3 | 99 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3159482 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1048838 | |
| . | 1048575 | |
| 1 | 1048213 | |
| - | 13757 | 0.4% |
| 3 | 99 | < 0.1% |
total_amount
Real number (ℝ)
High correlation 
| Distinct | 13721 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 27.445239 |
| Minimum | -695.75 |
|---|---|
| Maximum | 1617.5 |
| Zeros | 154 |
| Zeros (%) | < 0.1% |
| Negative | 13757 |
| Negative (%) | 1.3% |
| Memory size | 8.0 MiB |
Quantile statistics
| Minimum | -695.75 |
|---|---|
| 5-th percentile | 10.8 |
| Q1 | 15.3 |
| median | 20.02 |
| Q3 | 29 |
| 95-th percentile | 82.79 |
| Maximum | 1617.5 |
| Range | 2313.25 |
| Interquartile range (IQR) | 13.7 |
Descriptive statistics
| Standard deviation | 24.051793 |
|---|---|
| Coefficient of variation (CV) | 0.87635574 |
| Kurtosis | 43.181553 |
| Mean | 27.445239 |
| Median Absolute Deviation (MAD) | 5.74 |
| Skewness | 2.9266258 |
| Sum | 28778391 |
| Variance | 578.48873 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 16.8 | 16042 | 1.5% |
| 12.6 | 15513 | 1.5% |
| 21 | 12865 | 1.2% |
| 15.96 | 9150 | 0.9% |
| 15.12 | 9083 | 0.9% |
| 14.28 | 8956 | 0.9% |
| 17.64 | 8423 | 0.8% |
| 18.48 | 8266 | 0.8% |
| 14 | 8166 | 0.8% |
| 13.44 | 8134 | 0.8% |
| Other values (13711) | 943977 |
| Value | Count | Frequency (%) |
| -695.75 | 1 | |
| -591 | 1 | |
| -464.67 | 1 | |
| -426.54 | 1 | |
| -416.34 | 1 | |
| -415.75 | 1 | |
| -396.2 | 1 | |
| -374.04 | 1 | |
| -369.75 | 1 | |
| -315.47 | 1 |
| Value | Count | Frequency (%) |
| 1617.5 | 1 | |
| 940.93 | 1 | |
| 821 | 1 | |
| 715.75 | 1 | |
| 696 | 1 | |
| 630.09 | 1 | |
| 601 | 1 | |
| 586.6 | 1 | |
| 560.18 | 1 | |
| 551.59 | 1 |
congestion_surcharge
Categorical
High correlation  Imbalance 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.0 MiB |
| 2.5 | |
|---|---|
| 0.0 | 90757 |
| -2.5 | 11148 |
| 0.75 | 1 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0106325 |
| Min length | 3 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 2.5 |
|---|---|
| 2nd row | 2.5 |
| 3rd row | 2.5 |
| 4th row | 2.5 |
| 5th row | 2.5 |
Common Values
| Value | Count | Frequency (%) |
| 2.5 | 946669 | |
| 0.0 | 90757 | 8.7% |
| -2.5 | 11148 | 1.1% |
| 0.75 | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2.5 | 957817 | |
| 0.0 | 90757 | 8.7% |
| 0.75 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 1048575 | |
| 5 | 957818 | |
| 2 | 957817 | |
| 0 | 181515 | 5.7% |
| - | 11148 | 0.4% |
| 7 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3156874 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| . | 1048575 | |
| 5 | 957818 | |
| 2 | 957817 | |
| 0 | 181515 | 5.7% |
| - | 11148 | 0.4% |
| 7 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3156874 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| . | 1048575 | |
| 5 | 957818 | |
| 2 | 957817 | |
| 0 | 181515 | 5.7% |
| - | 11148 | 0.4% |
| 7 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3156874 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| . | 1048575 | |
| 5 | 957818 | |
| 2 | 957817 | |
| 0 | 181515 | 5.7% |
| - | 11148 | 0.4% |
| 7 | 1 | < 0.1% |
Airport_fee
Categorical
Imbalance 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.1 MiB |
| 0.0 | |
|---|---|
| 1.75 | |
| -1.75 | 1941 |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.0954114 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 950470 | |
| 1.75 | 96164 | 9.2% |
| -1.75 | 1941 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 950470 | |
| 1.75 | 98105 | 9.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1900940 | |
| . | 1048575 | |
| 1 | 98105 | 3.0% |
| 7 | 98105 | 3.0% |
| 5 | 98105 | 3.0% |
| - | 1941 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3245771 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1900940 | |
| . | 1048575 | |
| 1 | 98105 | 3.0% |
| 7 | 98105 | 3.0% |
| 5 | 98105 | 3.0% |
| - | 1941 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3245771 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1900940 | |
| . | 1048575 | |
| 1 | 98105 | 3.0% |
| 7 | 98105 | 3.0% |
| 5 | 98105 | 3.0% |
| - | 1941 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3245771 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1900940 | |
| . | 1048575 | |
| 1 | 98105 | 3.0% |
| 7 | 98105 | 3.0% |
| 5 | 98105 | 3.0% |
| - | 1941 | 0.1% |
Interactions
Correlations
| Airport_fee | DOLocationID | PULocationID | RatecodeID | VendorID | congestion_surcharge | extra | fare_amount | improvement_surcharge | mta_tax | passenger_count | payment_type | store_and_fwd_flag | tip_amount | tolls_amount | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Airport_fee | 1.000 | 0.074 | 0.396 | 0.034 | 0.056 | 0.321 | 0.480 | 0.252 | 0.265 | 0.257 | 0.041 | 0.149 | 0.006 | 0.073 | 0.395 | 0.270 | 0.003 |
| DOLocationID | 0.074 | 1.000 | 0.085 | -0.055 | 0.011 | 0.125 | -0.001 | -0.103 | 0.014 | 0.087 | -0.012 | 0.036 | 0.004 | -0.006 | -0.050 | -0.091 | -0.102 |
| PULocationID | 0.396 | 0.085 | 1.000 | -0.141 | 0.037 | 0.181 | -0.037 | -0.165 | 0.013 | 0.023 | -0.022 | 0.028 | 0.004 | -0.049 | -0.148 | -0.157 | -0.167 |
| RatecodeID | 0.034 | -0.055 | -0.141 | 1.000 | 0.187 | 0.342 | -0.121 | 0.374 | 0.016 | 0.018 | 0.073 | 0.052 | 0.006 | 0.093 | 0.492 | 0.356 | 0.283 |
| VendorID | 0.056 | 0.011 | 0.037 | 0.187 | 1.000 | 0.073 | 0.597 | 0.060 | 0.066 | 0.064 | 0.200 | 0.061 | 0.100 | 0.004 | 0.024 | 0.065 | 0.000 |
| congestion_surcharge | 0.321 | 0.125 | 0.181 | 0.342 | 0.073 | 1.000 | 0.273 | 0.497 | 0.521 | 0.542 | 0.018 | 0.304 | 0.007 | 0.066 | 0.088 | 0.523 | 0.000 |
| extra | 0.480 | -0.001 | -0.037 | -0.121 | 0.597 | 0.273 | 1.000 | 0.088 | 0.242 | 0.245 | -0.036 | 0.161 | 0.065 | 0.147 | 0.151 | 0.185 | 0.103 |
| fare_amount | 0.252 | -0.103 | -0.165 | 0.374 | 0.060 | 0.497 | 0.088 | 1.000 | 0.459 | 0.454 | 0.066 | 0.304 | 0.006 | 0.441 | 0.433 | 0.965 | 0.891 |
| improvement_surcharge | 0.265 | 0.014 | 0.013 | 0.016 | 0.066 | 0.521 | 0.242 | 0.459 | 1.000 | 0.501 | 0.022 | 0.328 | 0.029 | 0.007 | 0.047 | 0.500 | 0.000 |
| mta_tax | 0.257 | 0.087 | 0.023 | 0.018 | 0.064 | 0.542 | 0.245 | 0.454 | 0.501 | 1.000 | 0.036 | 0.323 | 0.007 | 0.107 | 0.163 | 0.498 | 0.000 |
| passenger_count | 0.041 | -0.012 | -0.022 | 0.073 | 0.200 | 0.018 | -0.036 | 0.066 | 0.022 | 0.036 | 1.000 | 0.039 | 0.046 | 0.014 | 0.070 | 0.063 | 0.054 |
| payment_type | 0.149 | 0.036 | 0.028 | 0.052 | 0.061 | 0.304 | 0.161 | 0.304 | 0.328 | 0.323 | 0.039 | 1.000 | 0.005 | 0.020 | 0.034 | 0.327 | 0.000 |
| store_and_fwd_flag | 0.006 | 0.004 | 0.004 | 0.006 | 0.100 | 0.007 | 0.065 | 0.006 | 0.029 | 0.007 | 0.046 | 0.005 | 1.000 | 0.003 | 0.007 | 0.007 | 0.000 |
| tip_amount | 0.073 | -0.006 | -0.049 | 0.093 | 0.004 | 0.066 | 0.147 | 0.441 | 0.007 | 0.107 | 0.014 | 0.020 | 0.003 | 1.000 | 0.252 | 0.580 | 0.411 |
| tolls_amount | 0.395 | -0.050 | -0.148 | 0.492 | 0.024 | 0.088 | 0.151 | 0.433 | 0.047 | 0.163 | 0.070 | 0.034 | 0.007 | 0.252 | 1.000 | 0.445 | 0.412 |
| total_amount | 0.270 | -0.091 | -0.157 | 0.356 | 0.065 | 0.523 | 0.185 | 0.965 | 0.500 | 0.498 | 0.063 | 0.327 | 0.007 | 0.580 | 0.445 | 1.000 | 0.866 |
| trip_distance | 0.003 | -0.102 | -0.167 | 0.283 | 0.000 | 0.000 | 0.103 | 0.891 | 0.000 | 0.000 | 0.054 | 0.000 | 0.000 | 0.411 | 0.412 | 0.866 | 1.000 |
Missing values
Sample
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 01-01-2024 00:57 | 01-01-2024 01:17 | 1 | 1.72 | 1 | N | 186 | 79 | 2 | 17.7 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 22.70 | 2.5 | 0.00 |
| 1 | 1 | 01-01-2024 00:03 | 01-01-2024 00:09 | 1 | 1.80 | 1 | N | 140 | 236 | 1 | 10.0 | 3.5 | 0.5 | 3.75 | 0.0 | 1.0 | 18.75 | 2.5 | 0.00 |
| 2 | 1 | 01-01-2024 00:17 | 01-01-2024 00:35 | 1 | 4.70 | 1 | N | 236 | 79 | 1 | 23.3 | 3.5 | 0.5 | 3.00 | 0.0 | 1.0 | 31.30 | 2.5 | 0.00 |
| 3 | 1 | 01-01-2024 00:36 | 01-01-2024 00:44 | 1 | 1.40 | 1 | N | 79 | 211 | 1 | 10.0 | 3.5 | 0.5 | 2.00 | 0.0 | 1.0 | 17.00 | 2.5 | 0.00 |
| 4 | 1 | 01-01-2024 00:46 | 01-01-2024 00:52 | 1 | 0.80 | 1 | N | 211 | 148 | 1 | 7.9 | 3.5 | 0.5 | 3.20 | 0.0 | 1.0 | 16.10 | 2.5 | 0.00 |
| 5 | 1 | 01-01-2024 00:54 | 01-01-2024 01:26 | 1 | 4.70 | 1 | N | 148 | 141 | 1 | 29.6 | 3.5 | 0.5 | 6.90 | 0.0 | 1.0 | 41.50 | 2.5 | 0.00 |
| 6 | 2 | 01-01-2024 00:49 | 01-01-2024 01:15 | 2 | 10.82 | 1 | N | 138 | 181 | 1 | 45.7 | 6.0 | 0.5 | 10.00 | 0.0 | 1.0 | 64.95 | 0.0 | 1.75 |
| 7 | 1 | 01-01-2024 00:30 | 01-01-2024 00:58 | 0 | 3.00 | 1 | N | 246 | 231 | 2 | 25.4 | 3.5 | 0.5 | 0.00 | 0.0 | 1.0 | 30.40 | 2.5 | 0.00 |
| 8 | 2 | 01-01-2024 00:26 | 01-01-2024 00:54 | 1 | 5.44 | 1 | N | 161 | 261 | 2 | 31.0 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 36.00 | 2.5 | 0.00 |
| 9 | 2 | 01-01-2024 00:28 | 01-01-2024 00:29 | 1 | 0.04 | 1 | N | 113 | 113 | 2 | 3.0 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 8.00 | 2.5 | 0.00 |
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1048565 | 2 | 13-01-2024 03:08 | 13-01-2024 03:20 | 6 | 2.89 | 1 | N | 211 | 233 | 1 | 14.9 | 1.0 | 0.5 | 1.00 | 0.0 | 1.0 | 20.90 | 2.5 | 0.0 |
| 1048566 | 2 | 13-01-2024 03:33 | 13-01-2024 03:49 | 6 | 3.44 | 1 | N | 148 | 246 | 1 | 19.1 | 1.0 | 0.5 | 4.82 | 0.0 | 1.0 | 28.92 | 2.5 | 0.0 |
| 1048567 | 2 | 13-01-2024 03:20 | 13-01-2024 03:23 | 1 | 0.97 | 1 | N | 90 | 125 | 1 | 6.5 | 1.0 | 0.5 | 2.30 | 0.0 | 1.0 | 13.80 | 2.5 | 0.0 |
| 1048568 | 2 | 13-01-2024 03:44 | 13-01-2024 03:48 | 1 | 0.77 | 1 | N | 125 | 249 | 1 | 6.5 | 1.0 | 0.5 | 1.75 | 0.0 | 1.0 | 13.25 | 2.5 | 0.0 |
| 1048569 | 2 | 13-01-2024 03:05 | 13-01-2024 03:17 | 1 | 2.52 | 1 | N | 79 | 68 | 1 | 14.2 | 1.0 | 0.5 | 3.00 | 0.0 | 1.0 | 22.20 | 2.5 | 0.0 |
| 1048570 | 2 | 13-01-2024 03:23 | 13-01-2024 03:28 | 1 | 1.04 | 1 | N | 246 | 48 | 1 | 7.2 | 1.0 | 0.5 | 2.44 | 0.0 | 1.0 | 14.64 | 2.5 | 0.0 |
| 1048571 | 2 | 13-01-2024 03:41 | 13-01-2024 03:43 | 1 | 0.82 | 1 | N | 246 | 50 | 1 | 5.8 | 1.0 | 0.5 | 2.70 | 0.0 | 1.0 | 13.50 | 2.5 | 0.0 |
| 1048572 | 2 | 13-01-2024 03:49 | 13-01-2024 03:52 | 1 | 0.89 | 1 | N | 246 | 48 | 1 | 6.5 | 1.0 | 0.5 | 2.30 | 0.0 | 1.0 | 13.80 | 2.5 | 0.0 |
| 1048573 | 2 | 13-01-2024 03:24 | 13-01-2024 03:36 | 2 | 3.63 | 1 | N | 114 | 141 | 1 | 17.0 | 1.0 | 0.5 | 2.00 | 0.0 | 1.0 | 24.00 | 2.5 | 0.0 |
| 1048574 | 2 | 13-01-2024 03:52 | 13-01-2024 04:18 | 1 | 8.27 | 1 | N | 164 | 188 | 1 | 36.6 | 1.0 | 0.5 | 10.40 | 0.0 | 1.0 | 52.00 | 2.5 | 0.0 |
Duplicate rows
Most frequently occurring
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 1 | 04-01-2024 11:14 | 04-01-2024 11:14 | 1 | 0.0 | 1 | N | 193 | 193 | 2 | 3.0 | 0.00 | 0.5 | 0.0 | 0.0 | 1.0 | 4.50 | 0.0 | 0.00 | 3 |
| 0 | 1 | 02-01-2024 08:25 | 02-01-2024 08:25 | 1 | 0.0 | 1 | N | 145 | 145 | 2 | 3.0 | 0.00 | 0.5 | 0.0 | 0.0 | 1.0 | 4.50 | 0.0 | 0.00 | 2 |
| 1 | 1 | 02-01-2024 11:25 | 02-01-2024 11:25 | 1 | 0.0 | 1 | N | 236 | 264 | 2 | 3.0 | 2.50 | 0.5 | 0.0 | 0.0 | 1.0 | 7.00 | 2.5 | 0.00 | 2 |
| 2 | 1 | 02-01-2024 14:43 | 02-01-2024 14:43 | 1 | 0.0 | 1 | N | 114 | 114 | 3 | 3.0 | 2.50 | 0.5 | 0.0 | 0.0 | 1.0 | 7.00 | 2.5 | 0.00 | 2 |
| 3 | 1 | 03-01-2024 08:46 | 03-01-2024 08:46 | 1 | 0.0 | 1 | N | 137 | 264 | 2 | 3.0 | 2.50 | 0.5 | 0.0 | 0.0 | 1.0 | 7.00 | 2.5 | 0.00 | 2 |
| 4 | 1 | 03-01-2024 14:11 | 03-01-2024 14:11 | 2 | 0.0 | 1 | N | 193 | 193 | 2 | 3.0 | 0.00 | 0.5 | 0.0 | 0.0 | 1.0 | 4.50 | 0.0 | 0.00 | 2 |
| 6 | 1 | 04-01-2024 13:34 | 04-01-2024 13:34 | 2 | 0.0 | 1 | N | 193 | 193 | 2 | 3.0 | 0.00 | 0.5 | 0.0 | 0.0 | 1.0 | 4.50 | 0.0 | 0.00 | 2 |
| 7 | 1 | 04-01-2024 15:23 | 04-01-2024 15:23 | 1 | 0.0 | 1 | N | 132 | 132 | 3 | 3.0 | 1.75 | 0.5 | 0.0 | 0.0 | 1.0 | 6.25 | 0.0 | 1.75 | 2 |
| 8 | 1 | 06-01-2024 11:59 | 06-01-2024 11:59 | 0 | 0.0 | 1 | N | 145 | 145 | 2 | 3.0 | 0.00 | 0.5 | 0.0 | 0.0 | 1.0 | 4.50 | 0.0 | 0.00 | 2 |
| 9 | 1 | 07-01-2024 15:40 | 07-01-2024 15:40 | 1 | 0.0 | 1 | N | 39 | 39 | 3 | 3.0 | 0.00 | 0.5 | 0.0 | 0.0 | 1.0 | 4.50 | 0.0 | 0.00 | 2 |